Coding schemes for time encoded speech (TES) voice messages
نویسندگان
چکیده
This paper reports an an investigation into the use of Time-Encoded Speech (TES) [1] for the economical storage of digital voice messages in the tactical military arena. Initial results indicate that bit rate· reductions of between 20°/o and 55°/o may be available using simple coding schemes. INTRODUCTION The need for an economical digital description of the human speech waveform for applications in the voice message arena is weil documented. Time-Encoding [1] appears to commend itself in this role. For simple TES schemes, high intelligibility and military quality may be obtained at bitrates of 12-16 kb/s [2] with time delays of 1-2 seconds [3 ]. These delays impose severe limitations for digital transmission but for voice massaging systems, would appear to be insignificant. Further, since a variable (noisy) channel is absent, efficient coding schemes may be employed to reduce the number of bits per TES message [4], without the vulnerabilities usually associated with such codes. lnspection of the TES symbol stream reveals a clustering of symbols associated with the acoustic event from which they were derived. This suggests that an efficient re-coding of the appropriate segment of the TES symbol stream may permit significant reductions to be achieved. For reconstruction and playback the coding process would be reversed to reproduce the original symbol stream. A detailed description of the TES coding format is reported in reference [5]. CODING SCHEMES ln the present investigations three simple segmentation protocols and four inter-frame coding options have been exercised in non real-time an a 15.36 secend sentence (see appendix). The procedures involved are described below: SEGMENTATION Fixed-length Segmentatjon : The TES symbol stream is partitioned into short (order of 1 0-20ms) fixed length time-frames. Varjable-length Segmentatjon : The TES symbol stream is partitioned into variable-length timeframes, where the boundaries of the time-frames are defined by an arbitrarily assigned maximum number Nmax of permitted elements within the frame. Typically, 7 < Nmax < 15. Whole-message Segmentatjon : The TES symbol stream for the whole voice message is stored. Within each time-frame, formed by any of the above schemes, there are Nz non-zero elements of the First Order Distribution (FOD), from an alphabat of NA possible elements. For fixed-length and whole-message time-frames Nz may approach NA, dependent upon the acoustic material in the utterance. For variable-length time-frames Nz may be arbitrarily restricted to an "aperture" Nmax to permit an efficient coding of the frame. The data frames consist of a total of N'T symbols. Nr will be dependent upon the acoustic material and the frame length. The variable-length time-frame segmentation deployed in this investigation utilizes a "threshold" Ievei, LT, to signal the point of segmentation. When LT equals zero and the (Nmax+ 1 )th element of the FOD is non-zero, segmentation is complete and the next frame commences with the last symbol processed. lf LT is non-zero then segmentation does not occur until the frequency of 1 Militaty Communications Research Group, School of Electrical Engineering and Science, RMCS (Cranfield), Shrivenham, Swindon. SN6 8LA.
منابع مشابه
A comparison of the performance of "normal" and "whispered" speech with simple time encoded digital speech (TES) direct voice input (DVI) systems in a tactical military environment
A preliminary investigation into the performance of a simple Time Encoded Speech (TES) isolated word recognition (IWR) direct voice input (DVI) system, using both normal and whi1pered (unvoiced) speech is described. Experimental conditions include evaluations with untrained military speakers in severe acoustic background noise ( c. 70-90 dB SPL) with handheld omnidirectional microphones. System...
متن کاملExcitation Codebook Design for Coding of the Singing Voice
The technique of Code Excited Linear Prediction (CELP) has led to the development of voice coding systems that provide toll quality speech at very low bitrates. While speech and singing share many similarities in terms of production, standard speech coding implementations fall far short when transmitting the singing voice. This paper explores the reasons for this discrepancy and suggests new va...
متن کاملAutomatic topic detection of recorded voice messages
We present an approach to automatic classification of spontaneously spoken voice messages. During overload periods at call-centers customers are offered a call-back at a later time. A speech dialog asks them to describe their concern on a voice box. The identified topics correspond to the supported service categories, which in turn determine the agent group the customer message is routed to. Ou...
متن کاملVoice-related quality of life (V-RQOL) outcomes in laryngectomees.
BACKGROUND Laryngeal cancer has a significant impact on patients. This study compared the Voice-Related Quality of Life (V-RQOL) outcomes specific to 3 different postlaryngectomy voice rehabilitation methods. METHODS We conducted a retrospective review of 75 patients with laryngectomy from our V-RQOL questionnaire database. RESULTS The database included 18 electrolaryngeal speech (ELS), 15 ...
متن کاملSpeech Enhancement using an Adaptive Gain Equalizer
This paper presents a noise reduction method for speech communication where the input signal is divided into a number of subbands that are individually weighted in time domain according to the short time Signal-to-Noise Ratio estimate (SNR) in each subband at every time instant. Instead of focusing on suppression the noise, the method is focusing on speech enhancement. The method has proven to ...
متن کامل